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The past few years have seen considerable progress in algorithmic development for the generation 
of gauge fields including the effects of dynamical fermions. The Rational Hybrid Monte Carlo 
(RHMC) algorithm, where Hybrid Monte Carlo is performed using a rational approximation in 
place the usual inverse quark matrix kernel is one of these developments. This algorithm has been 
found to be extremely beneficial in many areas of lattice QCD (chiral fermions, finite temperature, 
Wilson fermions etc.). We review the algorithm and some of these benefits, and we compare 
against other recent algorithm developements. We conclude with an update of the Berlin wall plot 
comparing costs of all popular fermion formulations. 
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1. Introduction 



We start as always with the Lattice QCD path integral 

(Q) = ^J [dU}e- s ^ u) [det^{U)] a a(U) 

where a = ^ (y) for staggered (Wilson) fermions, ^# = M^M with M the discretised Dirac 
operator. This implies we can define a theory with an arbitrary number of fermion flavours N f if 
we are willing to allow a non-integer a parameter. In this regime the conventional Hybrid Monte 
Carlo (HMC) algorithm fails because there is no method where by which we can directly evaluate 
either the action or its variation with respect to the gauge field to evaluate the force. Since current 
dynamical QCD calculations are focused on N, = 2 + 1 we must use other algorithms if we are to 
proceed. 



2. Hybrid Algorithms 

2.1 Hybrid Monte Carlo (HMC) 

Before we describe algorithms for non-integer a, we describe the HMC algorithm which is the 
de facto algorithm for fermion theories where a = 1 . Here we rewrite the fermion determinant in 
terms of pseudo-fermions 

det^ = J D^Dpe~*^~^ = J D(j> e~ St . 

We introduce a fictitious momentum field and define a Hamiltonian H = 4 tr n 2 + S g + S f . With the 
Hamiltonian defined, we can then evolve the gauge field through integrating Hamilton's equations 
of motion: the resulting gauge fields will have the desired probability distribution, 

To ensure ergodicity of the algorithm, the momentum field (and pseudofermion fields) must be 
refreshed periodically, once per trajectory. Integrating Hamilton's equations requires we discretise 
the fictitious time dimension and use a numerical integration scheme with stepsize 8x to evolve the 
gauge field (the so called molecular dynamics (MD)). This introduces an 0(8% ) error to the dis- 
tribution, where k is equal to the order of integration scheme used. This error can be stochastically 
corrected for with the use of a Metropolis acceptance test at the end of the each trajectory. The 
Metropolis acceptance test requires detailed balance, this places two constraints on the integration 
scheme chosen: it must be both reversible and area preserving. These constraints are maintained 
if we use symmetric symplectic integrators, the most simple of these of course being the second 
order leapfrog integrator. Hence the HMC algorithm is an exact algorithm and the results obtained 
are independent of the stepsize, the stepsize is thus chosen to give a non-negligible acceptance rate. 

Each update of HMC consists of 
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• Hybrid Molecular Dynamics Trajectory 

- Momentum refreshment heatbath (P(n) °< e~ n * n l 2 ). 

- Pseudo-fermion heatbath (0 oc M^, where P(E,) °= e~" S) 

- MD trajectory with t/<5t steps 

• Metropolis Acceptance Test _P. 1CC = min(l,e~ 5// ). 

The cost of the HMC algorithm rapidly increases with decreasing fermion mass; there are two 
principle sources for this mass dependence. We have to evaluate the inverse of the Dirac operator 
applied to a vector for each force evaluation, this is typically done using a Krylov solver such as 
conjugate gradient (CG), this cost blows up as the fermion mass decreases because of associated 
the condition number increase. The other reason is because of increase in fermion force magnitude 
which necessitates a decrease in stepsize to maintain a constant acceptance rate. The end result 
is that including the effect of physical dynamical fermions is an unrealistic prospect. Fortunately 
there have been improvements to the basic algorithm which vastly reduce this cost: these shall be 
covered in §g. 

2.2 The R Algorithm 

An alternative to using the pseudofermion formulation of HMC is rewrite the fermionic determinant 
as an exponential trace log, resulting in a fermion effective action 

det^# a = exp (atrln^#) = exp (-5 eff ) . 

The advantage of this method is that we can represent a theory with an arbitrary a parameter. As 
before with the HMC algorithm, we integrate Hamilton's equations to give us the desired proba- 
bility distribution of the gauge fields. However, when it comes down to evaluating the force, the 
presence of the trace in the action requires that we evaluate the explicit matrix inverse. This is 
obviously unfeasible, so the trace is represented by a noisy estimator: the resulting force term is 
equivalent to the pseudofermion force which is present in HMC. 

The problem with this method is that because a noisy estimator for the fermion force has been used, 
a new 0{8x) error is introduced. Through the introduction of an irreversible and non-area preserv- 
ing integration scheme an algorithm with 0(8t 2 ) leading error can be obtained. However, this 
breaks detailed balance so the algorithm cannot be made exact through the inclusion of a Metropo- 
lis acceptance test, hence the algorithm is inexact and any results obtained from the algorithm will 
have a dependence on the integrating stepsize. 

The cost of the R algorithm is naively equal to HMC in that one linear equation solve is required 
per force evaluation, however, strictly speaking an extrapolation to zero stepsize is required. This 
is an expensive and time consuming process. For many years, calculations done using staggered 
fermions with the R algorithm have used the rule of thumb 8t r ~ \m x ; the extrapolation to zero 
stepsize has been rarely carried out. This thus raises a potential question mark over any results 
generated since the size of the stepsize error is unknown. 
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2.3 Polynomial Hybrid Monte Carlo (PHMC) 

Like HMC upon which PHMC is based on, we start by representing the determinant in terms of 
pseudofermions. The kernel in the pseudofermion bilinear is replaced by an optimal polynomial 
approximation to the desired matrix function, 

det^ a = j ®tf®$e-f-*~** « J 9tf9$e-? p W*, 

where P{^) is valid over the spectral range of the operator [Q, Q]. The roots of optimal polynomial 
approximations come in complex conjugate pairs, P{^) = this means that the 

heatbath can easily be evaluated, = p(*rif) T}. 

A polynomial approximation has equivalent convergence properties to a Jacobi iteration. Such 
methods are known to be inferior to Krylov methods, hence there is potential for PHMC to be 
an expensive algorithm compared to HMC (i.e., the polynomial degree n will be much greater 
than N itBt the number of Krylov iterations). One approach to alleviate this problem is to use a low 
degree polynomial to represent the fermion action, and to correct for this by reweighting either 
the acceptance test or any observables by the required determinant ratio. The problem with this 
approach is that the difference between the guidance Hamiltonian and the actual Hamiltonian is 
extensive in the volume; this results in a poorer scaling with volume compared to HMC, however, 
there have been recent improvements in the PHMC algorithm which minimise this effect [|J]. 

For theories with a^l, the derivative must be taken using the Leibniz rule, hence we have to sum 
over n terms, this is costly with respect to memory (we need to simultaneously store n/2 vectors, 
with n ~ 0(100 — 1000)) and there is also increased potential for rounding errors. 

3. Rational Hybrid Monte Carlo (RHMC) 

An alternative to using a polynomial kernel is to use a rational kernel. In figure p] there is a com- 
parison between equivalent polynomial and rational approximations, plotting the relative error as 
a function of approximation degree. Rational approximations have far superior convergence prop- 
erties: over the spectral bounds we are concerned with, we can achieve an approximation to the 
desired function with maximum relative error that is good to machine precision (~ 10~ 15 ) with 
approximation degree O(20) or less (compared to 0(1000) with polynomials). 

Optimal rational approximations are generated using the Remez algorithm [m, the roots and poles 
of which are in general real. Thankfully the poles are also always positive for \a\ < 1 which are 
the functions of interest. When written in partial fraction form (r(x) = E/~=i J+fc) can De 
evaluated using a multi-shift solver [g] : hence the cost evaluating a rational function is essentially 
the same as a single matrix inversion, and to first order the precision is independent of the cost. It 

'Presumably the cheapest method to evaluate the heatbath is to expand the polynomial inverse in terms of partial 
fractions and to evaluate using a multi-shift solver. 
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Degree of approximation 



Figure 1: A comparison of the relative error between rational and polynomial approximation to the inverse 
square root function (spectral bounds = [3 x 10~ 3 , 1]). 

is also worth mentioning that since the coefficients are in general all the same sign (for \a\ < 1) 
the evaluation of rational functions using partial fractions is numerically stable. 

Exactly as was done with PHMC, we rewrite the determinant in terms of pseudofermions, but now 
replace the kernel in the bilinear by a rational approximation, 

with r(x) = x~ a l 2 . Unlike the case of PHMC, we have to enforce the degeneracy condition with a 
rational kernel to allow the heatbath to be evaluated. The great advantage now though is because 
we can include a rational approximation to arbitrary precision there is no requirement to reweight 
the acceptance test Hence we can proceed as exactly as the conventional HMC algorithm. The 
RHMC algorithm thus consists of 

• Hybrid Molecular Dynamics Trajectory 

- Momentum refreshment heatbath (P(n) oc e^ 71 * 71 / 2 ). 

- Pseudo-fermion heatbath ((f) <=c r(^#) _1 §, where P(%) ") 

- MD trajectory with t/5t steps 

• Metropolis Acceptance Test P acc = min(l,e~ 5// ). 



Evaluating the force of this fermion action would result in a double inversion because of the pres- 
ence of the square of the rational function. To get around this a different rational approximation is 
used for the MD: r as ^?~ a « r 2 . In this form, the pseudofermion force is just a sum of HMC like 
terms 

m 

Spf = - £ fi# f + A-r'^'M' + AO" V ■ 
i=i 

The cost of the algorithm is almost identical to that of HMC, in that there is one fermion inversion 
per force evaluation (there is one extra inversion for the heatbath evaluation at the beginning of 
each trajectory). 
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Figure 2: A comparison of the chiral condensate between the R and RHMC algorithms using P4 improved 
staggered fermions (V = 8 3 4, Ni = 3, left panel m = 0.01, right panel m = 0. 1) [B]. 

It is also perhaps worth mentioning that despite the fact that the pseudofermion is held fixed 
throughout the MD trajectory, one cannot use a chronological inverter to reduce the number 
of inversion iterations because of the requirement of the multi-shift solver that the initial guess 
must be zero. This however, represents no handicap in practice since the chronological inverter 
isn't applicable to the R algorithm or PHMC either. 



4. Finite Temperature 

The ideal testing ground for the RHMC algorithm is in the regime of finite temperature QCD. 
Most calculations here are done using staggered fermions, with N, = 2 + 1,3. Traditionally the R 
algorithm has been used and so previous results have had finite stepsize errors. It is of course a 
valid question to ask whether these previous results used a small enough stepsize to ensure that the 
stepsize errors are negligible: is 8t r ~ -^m, a valid prescription for choosing the stepsize? 

The RBC-Bielefeld collaboration use P4 improved staggered fermions for their finite temperature 
calculations. They have carried out various comparisons between results generated using the R and 
RHMC algorithms at different values of the fermion mass [Sj. In figure ||] we can see comparisons 
of the chiral condensate at two different fermion masses. It was found for both fermion masses that 
a stepsize considerably smaller than the 8r R ~ |mj was required to achieve agreement between the 
two algorithms. In all cases they found that RHMC greatly accelerated the process of generating 
gauge fields by use a much larger stepsize (i.e. 8t RHMC 3> 8z R ), this being especially true as the 
fermion mass is brought to zero. As a result of this study, all current gauge field production now 
exclusively uses RHMC. 

When studying the QCD equation of state and the order of the chiral transition one has to subtract 
zero temperature observables from finite temperature observables. Using stout smeared staggered 
fermions to study this behaviour, the Wuppertal-Budapest group have found that the finite stepsize 
error using R algorithm is the same order of magnitude as the typical subtraction (see figure || left 
panel). This makes the use of an exact algorithm mandatory when making such calculations. 
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£ 2 i/(T|V) 



Figure 3: Left panel: a plot of the expectation of the gauge of the gauge action: the deviation of R and 
RHMC at 8t R ~ \m y is comparable to the size of a typical subtraction when calculating the equation of state 
(e = 8t in the plot). Right panel: generated using RHMC demonstrating that the nature of the transition is a 
smooth crossover (stout improved staggered fermions, V = 16 4 , Nf = 2+l, left panel: m K =320 MeV, right 
panel: m K =140 MeV) Q. 
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Figure 4: Left panel: a plot demonstrating the increase in Binder cumulant as 8t r is reduced, tending 
towards the RHMC result. Right panel: the shift in the critical quark mass when switching from R (cross) to 
RHMC (line) (naive Staggered Fermions, N f = 3, V = 8 3 4) | 



The gain in performance from the RHMC algorithm has allowed, for the first time, finite temper- 
ature QCD to be studied using physical fermion masses [[K], O]. This study has revealed that the 
nature of the transition is a smooth crossover (see figure |5| right panel). 

A thorough study of the stepsize errors of the 7? algorithm has been carried out by de Forcrand 



and Philipsen [12]. They have found that the finite stepsize error causes the Binder cumulant to 
increase (see figure |] right panel) similar to other previous studies Jl3|], and again find significant 
cost reduction from switching from R to RHMC. More significantly at 8l R = \m x , the critical 
fermion mass is overestimated by 33% compared to the exact result. This corresponds to a 25% 
increase in the renormalised fermion mass. The finite stepsize error is present in renormalised 
quantities, this is in direct contradiction of the statement in p4||. 



The only possible conclusion that can be made from these finite temperature studies is that an exact 
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algorithm isn't just desirable from a cost point of view, but it is mandatory to ensure that the physics 
obtained is free from systematic error. 



5. Multiple Pseudofermions 
5.1 Mass Preconditioning 

A large gain in performance has been found in the technique of mass preconditioning the fermion 



determinant [15] 



det(M t M) = det(M f M)det (M _1 (M t M)/#~ t ) 

with m(M) > m(M). This product of determinants can then represented using two pseudofermions, 
with kernels (M^M)~ l and M(M^ M)~ l W respectively. Using multiple pseudofermions to repre- 
sent the determinant is beneficial because it reduces the fluctuations in the fermion force through 
better sampling of the Gaussian integral. Compared to the standard formulation there is a relatively 
small increase in naive cost because of the need to invert the extra operator in the second bilinear, 
but the use of multiple pseudofermions allows for a large increase in the stepsize: the end result 
being a net gain in performance. 

The initial strategy for mass preconditioning was to tune the dummy operator's mass parameter 
such that 



i.e., tune such that the condition number of the two kernels are equal Q16Q. This can gain up to 
a factor 2 improvement though the technique can be extended to introduce more pseudofermions 
potentially leading to larger performance gains. 

5.2 Multi-timescale Mass Preconditioning 

An alternative to the strategy suggested above is to tune the additional dummy operators such that 



the most expensive force contributes the least magnitude to the total force Q17| , |18Q (see figure 



|D left panel). This strategy then leads to large gains when used in combination with a multi- 



timescale (nested) integrator []19[]: as the fermion mass is brought to zero, the gain is found to be 
around a factor 10 over the basic HMC algorithm at current light fermion parameters (see figure 
|5] right panel). It is worth mentioning that this achieves at least the performance of the domain 



decomposition [20], but is much easier to implement. 



5.3 Multiple Pseudofermions with RHMC 

We can trivially rewrite fermion determinant 



det^ = [det^ 1 /"]" °< \[d$j d(j>] exp (-</))^ 1 /> ; -J , 

7=1 
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Figure 5: Left panel: the magnitude of the forces present in a multi-timescale mass preconditioned simula- 
tion: the most expensive force (F%) contributes the least to the total force and so can be included less often 
in the integration. Right panel: cost to generate 1000 independent configurations for Wilson fermions using 
mass preconditioning (blue squares) and conventional HMC (red line) compared to staggered fermions with 
the R algorithm (Wilson fermions, V = 24 3 32, N f = 2, j3 = 5.6) rfTJ 
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Figure 6: A plot of magnitude of 8H versus 
the stepsize for n — 1,2,3 pseudofermions with 
RHMC. The instability is triggered only for n = 
1 (staggered fermions, V = 16 4 , j3 = 5.6, Nt = 2, 
m = 0.005) [||. 



Figure 7: A plot of the number of conjugate gra- 
dient (CG) iterations required per pole to con- 
verge in the multi-shift solver and of the relative 
force contribution arising from each for n = 2,3 
pseudofermions (Wilson fermions Nf = 2, j8 = 
5.6, K = 0.15800, V = 24 3 x 32) [g. 



this is the so called n th root trick, and this action can easily be simulated using RHMC [21 , 22|]. Like 
the multiple pseudofermion technique of Hasenbusch, this alternative method leads to a speedup 
through an increase in stepsize. This method has the added virtue that there are no dummy mass 
parameters to tune, so it easy to extend to arbitrary n. For this approach, since all fermion inversions 
involve the same light kernel a single timescale is used for the fermions. 

Before we answer the question as to which of the multiple pseudofermion methods leads to the 
best performance, it is interesting to look at the question of integrator instability. On figure || 
we have plotted the magnitude of 8H as a function of stepsize. With n = 1 pseudofermions we 
can see that a point is reached where by the instability in the integrator is triggered, and there 



9 



The Rational Hybrid Monte Carlo Algorithm 



M. A. Clark 



is catastrophic breakdown of Hamiltonian conservation. This instability is caused by the light 
fermion modes [[23|] which have magnitude ~ 0{\). It is the presence of the instability that 
results in higher order integrators being detrimental for HMC: this is because such integrators are 
constructed out of leapfrog sub-steps which are longer than the overall step and are of course 
susceptible to the instability. What happens to this picture when multiple pseudofermions are 
used? On the same figure we have also included the equivalent results for n = 2,3. We can see 
that the improved Hamiltonian conservation from using multiple-pseudofermions as expected, but 
crucially the integrator instability has been removed. This is because the low modes which cause 
the instability, which were 0{\) are now 

( l y/n xhus 

since the limiting light modes have 
effectively been removed, we expect to gain significantly from the use of higher order integrators: 
this is indeed found to be the case, we have found a gain of a factor 2 switching from second order 
to fourth order integrators The switch to a higher order integrator is also expected to lead to 
improved volume scaling, i.e., the expected scaling of HMC with a second order integrator is V 5 / 4 , 
where as with a fourth order integrator it is V 9 / 8 [24]. 



5.3.1 Partial Fraction Forces 



It was noted in equation |3T] that the forces are merely a sum of HMC like force terms. Each of these 
forces is expected to have a different magnitude, and so contribute differing amounts to the total 
fermion force. Figure [7] is plot of the break down of the magnitude of the force originating from 
each partial fraction for n = 2,3 pseudofermions. The somewhat surprising result is that the terms 
which contribute the least to the total fermion force are those which cost the most (compare with 
the conjugate gradient (CG) iteration count). The force distribution arises primarily because of the 
nature of the rational coefficients. We would hope that we can make the most of this observation, 
and indeed we can: we loosen the CG tolerance on the less well conditioned shifts, i.e., those which 
contribute the least to the fermion force. This can strategy can gain up to a factor of 2 reduction 
in total CG iterations per trajectory. Of course the tolerance is not loosened for the heatbath and 
Metropolis evaluation to avoid any bias being introduced. As n is increased this effect is enhanced 
as can be seen on the plot. 



5.4 Who's the fastest of them all? (N t = 2 Wilson) 

We now compare the two multiple pseudofermion techniques presented so far in this paper: mass 
preconditioning and n th roots. To do this comparison the parameters and mass preconditioning 



results from [18] were used. The measure of efficiency that was adopted in this work was used 
here: the product of the autocorrelation length of the plaquette with the number of matrix 
vector products per trajectory N mv . 

As can be seen in table 111, the cost of the two algorithms is extremely similar and there is little to 
chose between the two. As is always the case, it would be interesting the compare the algorithms' 
relative performance as the chiral limit is taken. Given that the single timescale n th root formulation 
is using a higher order integrator, we would expect superior volume scaling. It would also be 
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K 


RHMC [g] 


Mass preconditioning [18] 


HMC [25] 


0.15750 


9.6 


9.0 


19.1 


0.15800 


19.1* 


17.4 


128 


0.15825 


52.5* 


56.5 





Table 1: A comparison of the costs of RHMC, mass preconditioning and plain HMC (Wilson fermions, 
V = 24 3 .32, j3 = 5.6, *Using 4MN5 fourth order integrator [§|]) 



i.!. wis? 
0.60845 - 

0.6084 - 
0.60835 - 
u 0.6083 - 
0.60825 - 
Oh 0.6082 - 
0.60815 - 

0.60SI - 
0.60805 - 

o.isoa - 



R algorithm 
RHMC result 

1 aSx" + b 



J-"' 



Figure 8: Left panel: R and RHMC plaquette values, the extrapolation to zero stepsize is clearly consistent 
with the RHMC result. Right panel: The plaquette autocorrelation length of R and RHMC (domain wall 
fermions, V = 16 3 x 32 x 8, N f = 2 + l,j8 = 0.72, m, = 0.02, m s = 0.04, 8z RHMC = 0.02) [Q. 

interesting to investigate whether the use of higher order integrators brings the same benefit to 
mass preconditioning. 



6. Algorithmic Improvement with N t = 2 + 1 
6.1 Domain Wall Fermions 

The first large production runs of RHMC were that of the N f = 2 + 1 program by the RBC-UKQCD 



collaboration \ y.7\ [28]]. As a first test of the algorithm, the standard RHMC algorithm was compared 
against the R algorithm. Figure |8] is taken from this initial study, where in the left panel the R 
algorithm's plaquette measurement is seen to extrapolate to the RHMC result at zero stepsize. A 
comparison of the algorithms' autocorrelation length revealed the expected difference due to the 
non-unit acceptance rate of RHMC. With this initial study completed, it was decided to use RHMC 
for all subsequent running. 

A somewhat obvious observation can be made when simulating a N f = 2 + 1 theory. For the current 
case of domain wall fermions, we write down the 2+1 flavour determinant 

/ detM^M, \ / detM s f M s \ 1/2 _ / detAf/AfA / detMJM, x 3/2 
^detMtM^ ) VdetMtM p J ~ \detMjM s J VdetMt^ 
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where in the right hand side we have trivially rewritten the determinant product. What this suggests 
is that we should mass precondition the light quark by the strange quark. Unlike the two flavour 
case, since we already have the heavier fermion present, there is no extra overhead from doing mass 
preconditioning. As is suggested in [18] we can then use a multiple timescale integration scheme 
(gauge, triple strange, preconditioned light pair) to make maximum use of the mass precondition- 
ing. This strategy is found to work extremely well, and leads to around a large reduction in CG 
cost [29]. It is interesting to note that in this arrangement, the light quark inversions take up around 
10% of the total CG cost, and thus the mass dependence of the cost has drastically reduced since 
generally the strange quark is held fixed. 



6.2 AsqTad Fermions 

Apart from finite temperature QCD calculations the other obvious target for the RHMC algorithm 
is the large scale AsqTad calculations that are done by the MILC collaboration. A significant cost 
reduction here would correspond to a saving of many Teraflop years of computer time. 

Following on from the idea of mass preconditioning using the strange quark developed in the 
previous section, here we do the same but with AsqTad fermions, 

det(^)2detL# s )* = ( det ^H 2 det(^l) 1 - 
\det(^ s ) / 

With staggered kernels, the mass is simply a multiple of the identity, this leads to a particularly 
simple form of fermion action since we can represent the ratio function by a single rational approx- 



imation [30] 



= ^\z^ x — J ft + &^ & 



= 0/r 2 (^)0, + 0y(.# s )</> s . 

In figure |9] we have plotted the breakdown of the force magnitudes before and after the mass precon- 
ditioning. The effect from the preconditioning is apparent in the plot: the light quark contribution is 
reduced by an order of magnitude and the strange quark increases by a factor of 3, as would be ex- 
pected. Again, we can take advantage of this through the use of a multiple timescale integrator. The 
mass preconditioning allows for a vast reduction in the CG cost of producing AsqTad ensembles, 
but this does not correspond to a likewise reduction in the total operation count: the triple strangle 
is the dominant cost as expected but this cost does not come from CG iterations, rather from the 



expensive operator derivative calculation that is required with AsqTad fermions [31]. Thus any 
further mass preconditioning is detrimental to performance because although it may reduce the CG 
count, it introduces more operations since each timescale requires a separate derivative calculation. 
The solution to this problem is to use the n th root method the triple strange. This introduces more 
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• N f =2+lDWF(L s tl6) 
2 Wilson 
2 Clover 
2TM 

2+1 AsqlailR 
2+1 AsqwdRHMC 
2+1 Clover 



N 



Regular Improved 



Figure 9: Comparison of the relative force mag- 
nitudes before and after preconditioning the light 
quark by the strange (AsqTad fermions, (V = 4 4 , 
N t = 2+1, m, = 0.01, m s = 0.05, J3 = 6.76) [M. 



Figure 10: An updated Berlin wall plot compar- 
ing the costs of various fermion formulations. 



pseudofermions all on a single timescale, allowing each field to reuse the same derivative calcula- 
tion, greatly reducing the cost, while increasing the stepsize of the triple strange timescale. Thus 
the optimum solution uses mass preconditioning and the n ,h root trick. When this improved tech- 
nique is tested at current MILC parameters (V = 24 3 x 64, j8 = 6.76, m x = 0.005, m s = 0.05) against 
the 7? algorithm it is found to lead to a factor 7-8 reduction in the operation count per trajectory, 



and of course the results generated are exact [30]. 



7. Berlin Wall Plot 



The standard method for demonstrating the mass scaling cost of lattice fermion formulations is to 
use the so called Berlin wall plot. On the x-axis is plotted the ratio of the pion and rho masses and 
on the y-axis is the cost in Teraflop years to generate 1000 independent gauge configurations. With 
all of the algorithmic improvements that have been applied to QCD in recent years, an updated 
Berlin wall plot is necessary to compare the costs of all the fermion formulations. For this plot, 
the following formulations (and algorithms) have been included: N t = 2 + 1 DWF RHMC [f-O], 



N f = 2 mass preconditioned Wilson [|18|], N f = 2 mass preconditioned clover []33[], N f = 2 + 1 mass 
preconditioned clover + RHMC [|34|], N f = 2 mass preconditioned twisted mass [p5[], N f = 2 + 1 
AsqTad R @] and N f = 2 + 1 AsqTad RHMC 



Since these various results have been run at different lattice spacing and volumes, the cost data must 
be scaled to be consistent. Hence, all data has been scaled to V = 24 x 40, a = 0.08. A word of 
warning about this box size and lattice spacing: this box is far too small and coarse to realistically 
represent the physical point, so the scaling behaviour as the physical point is approached should be 
taken with the requisite amount of salt. 



The updated Berlin wall plot can be seen in figure 1C. The cost of using domain wall fermions is 
around the expected factor of L s times more expensive than the Wilson formulations. All Wilson 
formulations would appear to have a similar cost and scaling. What is also evident from the plot is 
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that before switching to RHMC, the cost of using AsqTad fermions is significantly more than that 
of mass preconditioned Wilson fermions. When mass preconditioned RHMC is used the cost of 
AsqTad fermions seems to fall down to the level of Wilson fermions. The main point to be drawn 
from this plot though is that the mass dependences is much weaker for the improved algorithms 
than is suggested in [37]. 



8. Summary 

In this paper we have described the exact Rational Hybrid Monte Carlo algorithm and explained 
how it can be used for fermion theories where a non-integer power of the determinant is required. 

In the regime of finite temperature QCD studies, where having an exact algorithm is crucial, the 
RHMC algorithm was shown to be far superior to the R algorithm. Indeed, since using an exact 
algorithm is cheaper than an inexact one, there appears to be no need for the continued use of the 
R algorithm. 

With the use of the n tb root method of pseudofermions, RHMC is also an effective algorithm for 
theories with an integer power of the determinant. It was shown that the use of multiple pseud- 
ofermions removes the integrator instability, this mean that the use of higher order integrators can 
be very beneficial. Both this method, and that of mass preconditioning lead to large gains in per- 
formance compared to conventional HMC. These techniques can even be combined, and the use 
of RHMC allows one to use the strange quark to precondition the light quarks if N f = 2 + 1 is 
considered. 

The use of improved algorithms has greatly reduced the mass dependence of the cost of generating 
gauge configurations. Although we are not yet at a stage where we can simulate using physical 
quark masses, the possibility of doing so does not now lie that far out of reach. 
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